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APPROXIMATION OF FUNCTIONS OF LARGE MATRICES WITH 
KRONECKER STRUCTURE 

MICHELE BENZI* AND VALERIA SIMONCINlt 

Abstract. We consider the numerical approximation of f(A)b where b E R N and A is the sum 
of Kronecker products, that is A = M 2 <S> I 1 <S> Mi E W NxN . Here / is a regular function such 
that f(A) is well defined. We derive a computational strategy that significantly lowers the memory 
requirements and computational efforts of the standard approximations, with special emphasis on the 
exponential function, for which the new procedure becomes particularly advantageous. Our findings 
are illustrated by numerical experiments with typical functions used in applications. 
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1. Introduction. We consider the problem of approximating 


Mb 


(1.1) 


where / is a sufficiently regular function defined on the spectrum of A (see [20j), and 


A = M 2 ®I + I®M 1 e R NxN 


( 1 . 2 ) 


is the sum of Kronecker products with Mi £ ffi™ 1 * 711 5 M 2 £ K™ 2 *™ 2 so that N = nin 2 , 
b = vec(-B) with B a matrix of low rank with dimensions compatible with that of b. 
The Kronecker (or tensor) product of two matrices X and Y of size n x x m x and 
n y x m y , respectively, is defined as 


Xu Y 

X12 Y ■■ 

x lm x Y 

X21 Y 

X 22 Y ■ ■ 

’ % 2 m x Y 

Xn x lY 

a „ x 2 Y ■ ■ 

* %n x m x Y 


the vec operator stacks the columns of a matrix X = [x \,..., x m ] £ K" xm one after 
the other as 


vec(X) = 


Xi 


T>nmx 1 


The problem of approximating (ED for general A is very important in several ap¬ 
plications and has long attracted considerable attention; we refer the reader to [2D] 
and to [301 for comprehensive treatments of the problem and for many ways of nu¬ 
merically approximating its solution. For the case when A has large dimensions, new 
effective approaches have been devised, making the use of matrix function evaluations 
an important tool for solving large scale (three-dimensional) scientific and engineering 
problems involving discretized partial differential equations; see, e.g., [HE [23]. In 
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particular, the Kronecker structure above arises whenever the domain is a rectangle 
or a parallelepiped and finite difference or certain low-order finite element methods 
are employed to discretize differential equations with separable coefficients; see, e.g., 
Eli and references therein. Other applications leading to matrices with Kronecker 
sum structure include image processing [T9], queueing theory ED Chapter 9], graph 
analysis Qj Chapter 3.4], and network design [36j . 

A significant body of literature is now available on efficient numerical methods 
for approximately evaluating the product of f(A) times a vector 6 , using particular 
spectral properties of A and under certain regularity conditions on /. To the best of 
our knowledge, the computational advantages of exploiting, for a general function /, 
the possible Kronecker structure of A have not been addressed in the context of Krylov 
subspace methods for large-scale problems of the form (O). By taking into account 
this structure, and also the possible low rank of B, the computational setting changes 
significantly. We will show that the memory requirements can be drastically reduced: 
in fact, we show that by preserving the structure of the problem, faster convergence 
and significantly lower memory requirements can be achieved. More precisely, we 
acknowledge that the approximation to functions of A is the composition of distinct 
approximations in terms of Mi and M 2 , which are much smaller matrices. Similar 
considerations can be made for other properties of functions of matrices that are 
Kronecker sums, as is the case for their sparsity and decay patterns; see [ 6 ] for a 
recent analysis. 

Our results strongly rely on the low rank of the matrix B. In fact, but without 
loss of generality, we shall assume that B has rank equal to one, so that we can write 
B = bibT), b± € R ni , 62 £ K" 2 . For larger rank i <C min{ni,ri 2 }, we could still 
write B = BiB^ and proceed in a similar manner. Our results also apply when B 
is numerically low rank, that is, only a few singular values of B are above machine 
precision, or some other small tolerance. In this case, we could write b = b + b e where 
b = vec (B) with B of low rank, and || 6 e || <C 1. If ||/(-4)|| is not too large, 

f(A)b = f(A)b + f(A)b e &f(A)b. 

An outline of the paper is as follows. In section [2] we review some standard 
techniques for approximating 0 > when A is large, and set up the notation for the rest 
of the paper. In section[3]we derive the structure-exploiting approximation for general 
functions / such that f(A) is well defined. In section [4] we focus on the exponential 
function, for which the new procedure becomes particularly advantageous. Another 
important special case, the matrix inverse, is briefly discussed in section [5] and more 
general matrix functions in section]!)] Conclusions are given in section[7] Our findings 
are illustrated by numerical experiments with typical functions used in applications. 

2. General approximation by projection. A common procedure for large A 
constructs an approximation space, and a matrix V whose orthonormal columns span 
that space, and obtain 

f{A)b m Vf(H)e, H = V t AV, e = V T b. (2.1) 

Depending on the spectral properties of the matrix A and on the vector fo, the approx¬ 
imation space dimension may need to be very large to obtain a good approximation. 
Unfortunately, the whole matrix V may need to be stored, limiting the applicabil¬ 
ity of the approach. This is the motivation behind the recently introduced restarted 
methods, which try to cope with the growing space dimensions by restarting the 
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approximation process as soon as a fixed maximum subspace dimension is reached 

mm- 

A classical choice as approximation space is given by the (standard) Krylov sub- 
spactf] K m (A , b) = span{&, Ab ,..., A m ~ l b}. An orthonormal basis {rq,..., v m } can 
be constructed sequentially via the Arnoldi recurrence, which can be written in short 
as 


'\^ rn H rn T hv rn ^.\€ rn , V m [rq,..., u m ], 

here e m is the mth vector of the canonical basis of R m , H m = (as stated 

earlier), and h = \\Av m - J2iLi[ H rn\imVi\\. 

The past few years have seen a rapid increase in the use of richer approximation 
spaces than standard Krylov subspaces. More precisely, rational Krylov subspaces, 
namely 

m— 1 

K m (A, b,a m -i) =span{&, (A - a 1 I)~ 1 b, ..., {A - a i I)~ 1 b}, ( 2 . 2 ) 

2=1 

have been shown to be particularly well suited for matrix function approximations; 
we refer the reader to |16] for a recent survey on various issues related to rational 
Krylov subspace approximations of matrix functions. A special case is given by the 
extended Krylov subspace, which alternates powers of A with powers of A - 1 nine]. 

3. Exploiting the Kronecker structure. Assume that A has the form in <o 
and that, for simplicity, B has rank one, that is B = bib^■ We generate distinct ap¬ 
proximations for the matrices M\ and M 2 ; in the case of the classical Krylov subspace 
these are given as 

K m (Mi,bi), M 1 Q rn = Q m T 1 + q rn+1 t ( - 1 ' ) e^ n 

and 

A m (M 2 , 62 ), M 2 P m = P m T2 + Pm+it l '~' > e^- 

Note that the two spaces could have different dimensions; we will use the same di¬ 
mension for simplicity of presentation. The matrices Q m and P m have orthonormal 
columns, and have a much smaller number of rows than V m (the square root of it, if 
n 1 = n 2 ). We thus consider the following quantity to define the approximation: 

V = Pm ® Qm 


so that 

AV = A{Pm ® Qm) = M 2 P m ® Qm + P m ® M^Q rn 

= ( P m T 2 <g> Qm + Pm <g> Q m Ti) + p m+ it (2) e^ ® Qm + Pm ® ?m+ll (1) e^ 

= (Pm ® Qm)(T 2 ® / m + / m ® Ti) + low rank. 

Following the general setting in (ED, and defining T m = T 2 ® I m + I m ® ?i we thus 
consider the approximation 

f(A)b S3 := (Pm ® Qm)z, z = f(Tm)(Pm®Qm) T b. (3.1) 

1 In case b is a matrix, the definition of a “block” Krylov subspace is completely analogous, that 
is Km{A , b) = range([6, Ab ,..., A rn ~ 1 b]). 
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We stress that the matrix does not need to be explicitly computed and stored. 

Indeed, letting Z G K mxm be such that 2 = vec (Z), it holds that x® = vec (Q m ZPff)\ 
moreover, (P m <g) Q m ) T b = vec((Q^&i)(&]fP m )). The following proposition provides 
a cheaper computation in case both Tj and T 2 are diagonalizable, as is the case for 
instance when they are both symmetric. 

Proposition 3 . 1 . Assume that the matrices Pi,P2 are diagonalizable, and let 
T\ = XA.X -1 , T 2 = YQY- 1 be their eigendecompositions. Let 

9 = /(© ® Im + Im « A)vec(X- 1 Q^fo 1 6^P m y- r ) e K" 1 '. 

With the notation and assumptions above, for G such that g = vec(G) it holds that 

x® = vec (Q m XGY T pT). 


Proof. Using the properties of the Kronecker product (see, e.g., [24] Corollary 
4.2.11 and Theorem 4.4.5]), the eigendecomposition of T m = T 2 0 I + I 0 Pi is given 
by 


t 2 0 / +1 0 Ti = (y 0 x)(© 0 / m + j m ® A)(y _1 ® X” 1 ), 

so that f(Tm) = (Y 0-X")/(0 0 I m + / m 0 A)(F _1 0X _1 ), where /(00/ m + / m 0 A) 
is a diagonal matrix. The result follows from explicitly writing down the eigenvector 
matrices associated with each Kronecker product; note that /(0 ® / m + I m ® A) can 
be computed cheaply as both 0 and A are diagonal. 0 

In addition to providing a computational procedure for determining x®, Proposi¬ 
tion m reveals that, in exact arithmetic, the true vector x = f{A)b can be obtained 
using information from spaces of dimension at most m = max{ni,ri 2 }, whereas the 
standard approximation x m may require a much larger dimension space. This fact is 
due to both the Kronecker form of A and the composition of b , as b corresponds to the 
“vectorization” of the rank-one matrix bibJf . The following examples illustrate this 
property, while more explicit formulas can be obtained for the exponential function, 
as we will describe in section [4] 

Example 3.2. We consider /(x) = \J x and M\ = M 2 = —tridiag(l, —2,1) e 
M" xn , n = 50, each corresponding to the (scaled) centered three-point discretization 
of the one-dimensional negative Laplace operator in (0,1). We first consider b\ equal to 
the vector of all ones, and b 2 a vector of random values uniformly distributed in (0,1); 
the results are shown in Table o We observe that convergence is faster, in terms 
of space dimension, for x®. Moreover, once subspaces of dimension m = n = 50 are 
reached, a rather accurate approximation is obtained with the structure-preserving 
approach, as the full eigenspace of Mi is generated. We next consider the case of 
b 2 = bi (the vector of all ones) and report the numerical experiments in Table 13.21 
Convergence is even faster in the structure preserving method, as apparently conver¬ 
gence is faster with b 1 than with the original b 2 . No major difference is observed in 
the standard procedure. 

Example 3.3. Data for this example are taken from [13|. We consider the 
function f(z) = (e s% ^ — 1 )/z with s = 10 -3 , and A = M 0 I + I ® M, where 
M is the n x n tridiagonal matrix of the finite difference discretization of the one¬ 
dimensional Laplace operator; b := bi = b 2 is the vector of all ones. Table 13.11 
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m 

\\f(A)b-x m \\ 

\\f(A)b-x®\\ 

5 

1.4416e+00 

9.6899e-01 

10 

5.2832e-01 

2.7151e-01 

15 

2.2517e-01 

8.4288e-02 

20 

9.9517e-02 

1.8327e-02 

25 

4.0681e-02 

8.5632e-03 

30 

1.5114e-02 

2.7162e-03 

35 

9.0086e-03 

5.3891e-04 

40 

6.3515e-03 

1.9269e-04 

45 

3.5355e-03 

1.9476e-05 

50 

1.7627e-03 

6.4440e-13 


Fig. 3.1. Example \3.2\ for f(x) = <Jx. 
b 2 ^ bi. 


m 

\f(A)b- x m \\ 

|/(.4)6-a;®|| 

5 

1.9371e+00 

1.5903e+00 

10 

7.5344e-01 

4.5636e-01 

15 

3.3417e-01 

1.3538e-01 

20 

1.4240e-01 

2.5706e-02 

25 

5.1205e-02 

1.1719e-12 

30 

1.2671e-02 

1.1034e-12 

35 

5.1316e-03 

1.4357e-12 

40 

1.7854e-03 

l.U86e-12 

45 

6.2249e-04 

1.2297e-12 

50 

1.8720e-04 

1.2975e-12 


Fig. 3.2. Examvle 13.31 for f(x) = <Jx. 
b 2 = b 1 . 


m 

\\f(A)b-x m \\ 

||/(^l)6-x®|| 

H^m ^m,old || 



II*SII 

4 

4.2422e-01 

3.9723e-01 

1.0000e+00 

1.0000e+00 

8 

2.6959e-01 

2.1025e-01 

2.2710e-01 

2.5313e-01 

12 

1.7072e-01 

1.0365e-01 

1.3066e-01 

1.2971e-01 

16 

1.0324e-01 

4.2407e-02 

8.3444e-02 

6.9960e-02 

20 

5.7342e-02 

1.1176e-02 

5.4224e-02 

3.3969e-02 

24 

2.7550e-02 

4.8230e-04 

3.4054e-02 

1.0935e-02 

28 

1.0351e-02 

2.8883e-12 

1.9296e-02 

4.8230e-04 

32 

3.4273e-03 

2.8496e-12 

8.3585e-03 

1.1366e-13 

36 

2.2906e-03 

2.9006e-12 

1.7514e-03 

1.4799e-13 

40 

9.4368e-04 

2.8119e-12 

1.6283e-03 

2.7323e-13 

44 

4.3935e-04 

2.7593e-12 

6.2797e-04 

2.1786e-13 

48 

1.8744e-04 

2.8235e-12 

3.0332e-04 

2.5965e-13 


Table 3.1 

Examvle \3.3l For f(x) = ( e s A* — 1 )/z with s = 10~ 3 . Here n = 50. 


shows the approximation history of the standard method and of the new approach for 
n = 50. Because of the small size, we could compute and monitor the true error. In 
the last two columns, however, we also report the relative difference between the last 
two approximation iterates, which may be considered as a simple-minded stopping 
criterion for larger n; see, e.g., [55] or m for more sophisticated criteria. The results 
are as those of the previous examples. In Table 13.21 we report the runs for n = 100, 
for which we could not compute the exact solution, so that only the error estimates 
are reported. The results are very similar to the smaller case. In this case, memory 
requirements of the structured approximation become significantly lower than for the 
standard approach. 

4. The case of the matrix exponential. The evaluation of (11.11) with f(A) = 
exp(„4) presents special interest owing to its importance in the numerical solution of 
time-dependent ODEs and PDEs EU [22J [23]. The problem also arises in network 
science, when evaluating the total communicability of a network Emu. 

The exponential function provides a particularly favorable setting in the case of 
a matrix having Kronecker form. Indeed, due to the property (see, e.g., [201 Theorem 
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m 

||*m—*m,oIdll 


Ikmll 

ll*an 

4 

1.0000e+00 

1.0000e+00 

8 

2.3942e-01 

2.7720e-01 

12 

1.5010e-01 

1.6289e-01 

16 

1.0716e-01 

1.0966e-01 

20 

8.1062e-02 

7.8150e-02 

24 

6.3308e-02 

5.7003e-02 

28 

5.0347e-02 

4.1674e-02 

32 

4.0409e-02 

2.9992e-02 

36 

3.2507e-02 

2.0802e-02 

40 

2.6052e-02 

1.3446e-02 

44 

2.0667e-02 

7.5529e-03 

48 

1.6104e-02 

2.9970e-03 

52 

1.2194e-02 

3.1470e-04 

56 

8.8234e-03 

1.1354e-12 

60 

5.9194e-03 

3.4639e-13 


Table 3.2 

Example I ■>. ■>! For f(x ) = (e 3 ^ 3 — l)/z with s = 10~ 3 and & 2 = £> 1 . Here n = 100. 


10.9]) 


exp(T 2 0 / m + I m 0 Ti) = exp(T 2 ) 0 exp(Ti), (4.1) 

formula (El) simplifies even further. Indeed, we obtain 

= (Pm 0 Qm) exp(T2 0 Im H - Im ® ^1 )(Pm ® Qm) b 

= {Pm ® Qm)(exp(T 2 ) ® exp(T!))(P m 0 Q m ) T b 
= vec (Q m exp(Ti)Q^&ifrfP m exp(T 2 ) T P^) 

= vec(xW(a;^ ) ) T ), (4.2) 

with I™ = Q m exp(Ti)<5m6i and x£f = P m exp(T 2 )(P,^& 2 ). We observe that the 
final approximation x® is the simple combination of the two separate approximations 
of exp(M 1 )b 1 and exp(M 2 )b 2 . Indeed, the same approximation could be obtained by 
first writing 

exp(.4)& = exp(M 2 ) 0 exp(Mi)b = vec((exp(Mi)fei)(6^ exp(M 2 ) T ), (4.3) 

and then using the standard approximations exp(Mi)6i ft Q m exp(Ti)Q^bi and 
exp(M 2 )6 2 » P m exp(T 2 )P^6 2 . 

In the following we illustrate the behavior of the approximation to the matrix 
exponential with a few numerical examples. Here the standard Krylov subspace is 
used in all instances for approximating the corresponding vector. We stress that 
because of the decreased memory allocations to generate K m {Mi 1 bi ), the computation 
of x® can afford significantly larger values of m, than when building K m (A, b). 

Example 4.1. We consider the approximation of exp (.4)5 with A as in (11.21) 
and Mi = tridiag(l, — 2,1) and M 2 = tridiag(2, — 3, 2), both of size n\ = n 2 = 70. 
Therefore, A has dimension 4900. Moreover, we take b\ = [1,..., 1] T and b 2 a vector 
with random values uniformly distributed in (0,1). Thanks to the problem size, the 
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Fig. 4.1. Convergence history to exp(.4) b. Left: Example \4 l\ Right: Example \4-2\ 


vector exp(_4)5 could be computed explicitly. Figure [TTJ left) reports the convergence 
history as the space dimension m increases when using two different approaches: the 
first one uses K m (A , b) as approximation space, so that x m = V m exp(iJ m )(V^6) with 
H m = Vj n AV rn \ the second one uses a:® in (14.21) . We observe that the convergence of 
a;® is faster than that of x m \ this fact will be explored in section 1X11 The plot also 
reports the error norm in the approximation of x W and x^: the error norm for a;® 
is mainly driven by that of the most slowly converging approximation between xl2 
and a 42 ■ 

Example 4.2. We modify Example 14.11 bv setting M 2 = tridiag(l, —2,1), while 
Mi to be equal to the discretization by finite differences of the one-dimensional non- 
selfadjoint operator C{u) = u xx — 100w x on the interval [0,1]. Hence, Mi (and there¬ 
fore A) is nonsymmetric. The matrix dimensions and the vectors bi , &2 are as in the 
previous example. The convergence history is reported in the right plot of Figure [XT] 
Similar comments as for Example 14.11 can be deduced. 

Example 4.3. The next example arises in graph and network analysis. Given 
two graphs G 1 = (Vi,Ei) and G 2 = (V 2 , E 2 ), we consider the Cartesian product 
Q = G 1 DG 2 of the two given graphs, defined as follows. The vertex set of Q is just 
the Cartesian product Vi x V 2 , and there is an edge between two vertices (iq, M 2 ) and 
(vi,V 2 ) of Q if either ui = Vi and ( 142 ,^ 2 ) £ -E 2 , or m 2 = V 2 and € Ei. The 

adjacency matrix of Q is then the Kronecker sum of the adjacency matrices of Gi and 
G 2 [Ij page 37]; see also [36] for definitions (and applications) in the case of directed 
graphs. A useful notion in the analysis of complex networks is the total communicabil¬ 
ity , which is defined as the row sum of the exponential of the adjacency matrix, see [5]. 
The entries of this vector provide a measure of the “importance” of the nodes in the 
network, and can be computed as exp(A)6 where now b is the vector of all ones (note 
that the corresponding matrix B has rank one). We consider five Cartesian product 
graphs of the form Qi = GiOGi , with each Gi being a Barabasi Albert graph con¬ 
structed using the preferential attachment model. The command pref in the Matlab 
toolbox contest ]34] was used (with the default choice of parameters) to generate 
five graphs on n nodes, where n = 1000, 2000,..., 5000. Thus, the adjacency matrices 
of the corresponding Cartesian product graphs Qi have dimension ranging between 
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Fig. 4.2. Example \4-3\ Convergence history to exp (A)b. Left: Example \4-3\ case n = 1000. 
Right: Example\f.3\ convergence history for all five cases. 


one and twenty-five millions. All the resulting matrices are symmetric indefinite. 

Table 14.11 reports the CPU time required to compute a basis for the Krylov sub¬ 
space of dimension m = 30 as the graph matrix size increases (all runs were per¬ 
formed with Matlab R2011b [29] on a laptop with Intel Core i7-3687U CPU running 
at 2.10Ghz with 7.7GiB memory). The last column reports the time when A is used, 
so that V m exp(i7 m )ei||6|| is computed; the middle column refers to the case when Mi 
is used, so that a;® in (14.211 is computed. As expected, the CPU time for K m (Mi,bi) is 
several orders of magnitude smaller than for K m (A, b). In the latter case, timings be¬ 
came prohibitive for n = 3, 000, since the generation of the basis for the space entails 
the orthogonalization of vectors in R” . The computational costs remain extremely 
low when computing a basis for K m (Mi,bi). The left plot of Figure l4~2l shows the 
convergence history of the two approaches, in terms of space dimensions, when the 
smallest matrix in the set is used. Convergence is monitored by measuring the dif¬ 
ference between the last two iterates, as done in the previous examples. Once again, 
convergence is faster when the Kronecker form is exploited. The right plot of Figure 
14.21 reports the convergence history of x® for all matrices in the set. All spectra are 
roughly contained in the interval [—15,15], therefore the expected convergence rate is 
approximately the same for all matrices. 


n 

CPU Time 
K m {MiM) 

CPU Time 
K m {A, b) 

1000 

0.02662 

29.996 

2000 

0.04480 

189.991 

3000 

0.06545 

- 

4000 

0.90677 

- 

5000 

0.99206 

- 


Table 4.1 


Example \4-3\ CPU Time for the construction of the Krylov approximation space of dimension 
m = 30 when using either M\ £ R nxn (symmetric) or A = Mi ® I + I <S> Mi E R" xr> ■ Only 
results for the smallest graph matrices A are reported when building Km{A^ b). 
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Remark 4.4. Besides the exponential, the matrix sine and cosine are also well- 
behaved with respect to the Kronecker sum structure. Indeed, the following identities 
hold H Theorem 12.2]: 


sin(Afi ® M 2 ) = sin(Mi) ® cos (M 2 ) + cos(Afi) ® sin(M 2 ) 


(4.4) 


and 


cos(Mi ® M 2 ) = cos(A4i) ® cos(Af 2 ) — sin(Mi) ® sin(A4 2 ). (4.5) 


These identities can be exploited to greatly reduce the computational cost and storage 
requirements for the evaluation of f{A)b when / is either the sine or cosine or a 
combination of these functions. 

4.1. Convergence considerations. An expression for the error can be deduced 
by using the form in (14.31) . Indeed, letting = exp(A4i)6i and x^ = exp (M 2 )b 2 , 
and also X = x^ 1 \x^) T and X ® = Xm\x$) T , it holds 


||ex P (A)6 -x®|| = ||A-X®|| F 



Therefore, the error norm in the approximation to exp(A)6 is bounded by the errors 
of the two separate approximations with M\ and M 2 . The relation (14.61) can also be 
used for deriving a priori convergence bounds for exp(A)6 in terms of the bounds for 
M\ and M 2 . We will give such bounds in the case M\ and M 2 are Hermitian and 
positive definite. The following result was proved in [211- 

Theorem 4.5. Let M be a Hermitian positive semidefinite matrix with eigenval¬ 
ues in the interval [0,4p]. Then the error in the Arnoldi approximation o/exp(rM)u 
with ||w|| = 1, namely, e m := || exp(— tM)v — V m exp(—rT m )ei||, is bounded in the 
following ways: 

i) £m < 10 exp(—m 2 /(5pr)) ; for pr > 1 and y/4pr < m < 2 pr; 

ii) E m < 10(pr) _1 exp(-pr) (^) m for m > 2 pr. 

We next show how Theorem 14.51 can be used to compare the difference in con¬ 
vergence rates between x m and a;®. To simplify the presentation, we assume that 
M = Mi = M 2 , and that bi = b 2 , with ||6i|| = 1. We refer the reader to [H [25] for 
estimates similar to those of Theorem 14.51 

It can be shown that if A i, i = 1,... ,n are the eigenvalues of M (in decreasing 
order), then the n 2 eigenvalues of A are given by A i + A j, i,j 6 {1,... ,n}; see, e.g., 
[24l Theorem 4.4.5]. Therefore in particular, the largest and smallest eigenvalues of 
A equal 2Ai and 2A n , respectively. If we apply Theorem 14.51 to M — A„/ for r = 1, 
then we obtain that for m large enough the error is bounded as 




On the other hand, Theorem 14.51 applied to A — 2A n I yields 
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The ratio between the two bounds is given by 

ex P (I) 

2 m —1 ’ 

which is in favor of the computation with M for small p. In case p is very large, both 
methods become very slow. 

5. The case of the matrix inverse. Also very important in applications is 
the case of the inverse, f(A) = A _1 where A has Kronecker sum structure. The 
solution of linear systems of the form Ax = b arises in many applications (PDEs, 
imaging, Markov chains, networks, etc.) and Krylov subspace methods are widely 
used, typically together with preconditioning, to solve such systems. The Kronecker 
structure can be readily exploited if b is the result of the vectorization of a low rank 
matrix. For simplicity, let us assume that b = vec(&i&]f). Then the system Ax = b is 
equivalent to the following Sylvester equation (see, e.g., [Mj Sec. 4.4]): 

MiX + XM% = M 2 , x = vec(A). (5.1) 

Numerical methods that exploit the small size of Mi,M 2 can be used to solve the 
linear matrix equation in (15.11) . If n\ and 712 are of order up to a few thousands, then 
the Bartels-Stewart algorithm can be used j2]. Otherwise, X can be approximated 
by using different approaches, depending on the relative size of ni and 712 ; we refer 
the reader to for detailed discussion of the available methods and the related 
references. Here we briefly describe the idea of approximate solution by projection 
onto an appropriate subspace, which will be used in section 16.21 For simplicity of 
exposition we assume M 2 = Mi and 62 = 61 ; if this is not the case, straightforward 
modifications can be included; see [32] . If the orthonormal columns of P m (= Q m ) are 
a basis for the considered subspace of 1 " of dimension m , then an approximation to 
X is sought asl« X m = P m Y m P^, where Y m £ R mxm j g determined by imposing 
additional conditions. A common strategy consists of imposing that the residual 
R m := M\X m + X m Mi — bibi be orthogonal to the generated subspace, that is, 
(Pm ® Pm.) T vec(P m ) = 0, or, in matrix terms, P^R m P m = 0, where a zero matrix 
appears on the right-hand side. Substituting in this last equation the definition of 
R m and X m , gives 

P^M 1 P m Y m P^P m + P^P m Y m P^M^ P m - P^ l b 1 b^P m = 0. 

Recalling that P)^P m = I and that P^MiP m =T\, we obtain the small scale linear 
equation 

T\Ym + Y m Ti — b±bi =0, b\ = P^bi, 

whose solution yields Y m . In particular, 

\\x-x®\\ = \\X-P m YmPl\\F, (5.2) 

where || • ||p denotes the Frobenius norm. While we refer to [32] for a detailed analysis, 
here we notice that the approximate solution x® = vec(X m ) to f{A)b = A~ l b can be 
written in the more familiar form 

x® = vec(P m K m P^) = (P m ® P m )vec(Y m ) 

= (P m < 8 > P m )(Tj ® / + / (g> Ti) - 1 (P m ® P m ) T b. (5.3) 

This form will be used in section 16.21 to express the approximation error of Cauchy- 
Stieltjes functions. 
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6. Completely monotonic functions. Both the matrix exponential (in the 
form f(A) = exp(—A)) and the inverse are special cases of an important class of 
analytic functions, namely, the completely monotonic functions [ 55 j . We recall the 
following definitions. 

Definition 6 . 1 . Let f be defined in the interval (a, b) where —oo < a < b < +oo. 
Then, f is said to be completely monotonic in (a, b) if 

{—l) k f < ' k \x)>0 for all a<x<b and all k = 0,1,2,... 

Moreover, f is said to be strictly completely monotonic in (a, b) if 

(—l) k f( k \x )>0 for all a<x<b and all k = 0,1,2,... 

Here /^ denotes the fcth derivative of /, with /(°) = /. 

An important theorem of Bernstein states that a function / is completely mono¬ 
tonic in ( 0 ,oo) if and only if / is the Laplace-Stieltjes transform of a(r); 

/•OO 

/(*)=/ e~ TX da(r), (6.1) 

Jo 

where a(r) is nondecreasing and the integral in ( 16 . 11 ) converges for all x > 0 . See 
[o 5 l Chapter 4 ]. For this reason, completely monotonic functions on ( 0 ,oo) are also 
referred to as Laplace-Stieltjes functions. 

Important examples of Laplace-Stieltjes functions include: 

1. fi(x) = 1 /x = / Q °° e~ XT da\{r) for x > 0, where <2 i(t) = r for r > 0. 

2. f2{x) = e~ x = / 0 °° e~ XT da2{r) for x > 0, where 0:2 (t) = 0 for 0 < r < 1 and 
«2 (t) = 1 for t > 1. 

3. fo(x) = (1 — e~ x )/x = / 0 °° e~ XT das(T) for x > 0, where <23(7-) = r for 
0 < r < 1, and <23 (r) = 1 for t > 1. 

Also, the functions x~ a (for any a > 0 ), log(l + l/a;) and exp(l/x), are all strictly 
completely monotonic on ( 0 ,00). Moreover, products and positive linear combinations 
of strictly completely monotonic functions are strictly completely monotonic. 

Formula ( 16 . 11 ) suggests the use of quadrature rules to approximate f{A)b when A 
is a Kronecker sum and / is strictly completely monotonic on ( 0 , 00): 


f(A)b 



<? 

exp(— TA)bda(r) ''jF Wk exp(— TkA)b, 

k= 1 


( 6 . 2 ) 


where T\,... ,t^ £ (0, 00 ) are suitably chosen quadrature nodes and w±,... ,Wk £ R 
are the quadrature weights; see, for example, DU Sec. 5]. As shown in the previous 
section, the Kronecker sum structure can be exploited in the computation of the 
individual terms exp(— TkA)b. Also note that each contribution to the quadrature 
can be computed independently of the others, which could be useful in a parallel 
setting. This approach could be especially useful in cases where convergence of the 
Krylov subspace approximation is slow. 

In the case of interpolatory quadrature and Hermitian matrices, the absolute error 
\\f(A)b— X)fe=i w k exp(— TkA)b\\ (in the 2-norm) is easily seen to be bounded by e||&||, 
where e is defined as 


e = max 



9 

exp(-r(Ai + Hj)) da(r) - ^ w k exp( 

fc=1 


T k (Xi + Hj)) 
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and \i, fij range over the spectra of M 1; M 2 , where A = M 2 ® I + I ® M\. In the 
case = M 2 = M = M* , the error can be bounded by 

poo Q 

e < max / exp(—r(A + p)) da(r) — E w k exp(-r fe (A + p))\ , 

[A m j n , A m ax] J Q ^ 

where A m i n , A max are the extreme eigenvalues of M. For additional discussion of error 
bounds associated with the use of quadrature rules of the form (16.21) . see [T7J Sec. 5.7]. 

Analogous considerations apply to more general types of functions. For a function 
/ analytic inside a contour T £ C containing the eigenvalues of A in its interior and 
continuous on T we can write 


f (A) = ^- i j^ f ( z )(A- zI) - 1 d z . 

Quadrature rules can be used to obtain approximations of the form 

1 P 9 

f(-A)b=— / f(z)(A-zI)~ 1 b<lzK.'^w k (A-z k I)- 1 b 1 

111 -' r fc=i 

requiring the solution of the q linear systems (A — z k I)x = b , possibly in parallel for 
k = 1,... ,q. We refer to [T5j for details on how to apply this technique efficiently. 
Again, the Kronecker sum structure of A, if present, can be exploited to greatly reduce 
the computational cost and storage requirements. In particular, if b = vec(6i6|’), then 
according to section 0 each system (A — z k I)x = b is equivalent to solving the linear 
matrix equation (Mi — z k I)X + XMj = bib 2 , with x = vec(A). 

Another important class of functions is given by the Cauchy-Stieltjes (or Markov- 
type) functions, which can be written as 

/W=/hM, 26 C \r, 

Jr z-u 

where 7 is a (complex) measure supported on a closed set T C C and the integral 
is absolutely convergent. This class is closely related to, but distinct from, the class 
of Laplace-Stieltjes functions; see [35] Chapter VIII] for a general treatment. In this 
paper we are especially interested in the particular case T = (— 00 , 0 ] so that 

f{x)= j ieC\(-oo,0], (6.3) 

J -00 x-u 


where 7 is now a (possibly signed) real measure. Important examples of Cauchy- 
Stieltjes function that are frequently encountered in applications (see [16]) include 




-00 Z - U 7 TV OJ 


=dw, 


,— ty/5 _ | 


log(l + z) 


> —00 

P-1 


1 sin(t v / — ui) 
z — oj — 7 rw 

—EEdw. 


dw, 




— OO 


z — UI (—w) 
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6.1. Convergence analysis for Laplace Stieltjes functions. For Laplace- 
Stieltjes functions and symmetric positive definite matrices, in this section we analyze 
the convergence rate of the approximation obtained by exploiting the Kronecker form. 
Moreover, we compare this rate with that of the standard approximation with A, 
and that of the approximation of Mi. We will mainly deal with standard Krylov 
approximations, as error estimates for the exponential are available. A few comments 
are also included for rational Krylov subspaces. 

6.1.1. Analysis of Krylov subspace approximation. In this section we show 

that the convergence rate of the approximation when using x®, is smaller than that 
with x m = Vm/(U m )V^ 6 . Moreover, it is also smaller than the convergence rate 
of the approximation x'm to For simplicity of exposition we assume that 

Mi = M 2 and bi = 62 , with || 6 i|| = 1 . 

Proposition 6.2. Let f be a Laplace-Stieltjes function, and x® = ( P m ® 
Pm)f(Tm){Pm ® Pm) T b be the Kronecker approximation to f(A)b. Moreover, let 
x^ 1 ) = exp(— rMi)bi and im = P m exp(—rT 1 )P^6 1 . We also define the scaled quan¬ 
tity x^ 1 ) := e -AminT x < - 1 - ) = e - ( Ml+Amin/ ) r 5 1; - analogously for . Then 

pOO 

||/(A)6 -x®|| < 2 / ||x (1 ) -xW||da(r). 

Jo 

Proof. Recalling the notation of (|4.2[) and ()4.3|) leading to (|4.6[) , we have 

poo 

\\f(A)b — x® || = || / (e- TA b-(P m ®P m ) e - TT ™(Pm®P m ) T b) da(r)|| 

Jo 

p 00 

< / (lk (1) H + Ikmll) ll 2;(1) 

Jo 

p 00 

< / 2e" Ami ” T ||x (1) -xW||da(r) 

Jo 

poo 

= 2 || e -r(M 1+ X min I) bi _ p me -r(Ti+X min I)pT bi || da ( T ) 

Jo 

poo 

= 2 / ||x (1 ) -xW||da(r), 

Jo 

where in the last inequality we have used A m i n (A/i) < A m i n (Ti), so that 

Ikm^ll < e _Ami n( Tl ) T < g-Amm(Mi)iy 


□ 

We remark that the extra shift in the matrix Mi is what makes the solution x® 
converge faster than x™ . 

In light of Proposition ^. 21 bounds for the error norm can be found by estimating 
the error norm in the approximation of the exponential function under the measure 
da. Depending on the function a(r), different approximation strategies need to be 
devised. Here we analyze the case da(r) = -p^dr for some 7 > 0; for instance, the 
function /(x) = l/^y/x falls in this setting with 7 = 3/2. Then 

\\f(A)v - x® || <2 f — ||x (1) - x^P ||dr. 
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The case 7 = 0 is special. Since r 7 = 1, the integral on the right-hand side can 
be bounded as in [35J proof of (3.1)], so that it holds 


\\f(A)v — x® || < 2 


Vk + 1 / yfn — 1 

AminV^ \ y/~fc + 1 


where k — (A max T A m i n )/(A m i n T Amin). 

We focus next on the case 7 > 0. We split the integral as 


37P (1> -£mll dr = / 37P (1) -Zmll d ' r + y\ ! ^P (1) -Xm\\ dT 

4 P 


r 7 


70 T 7 
=: + ^ 2 - 

Lemma 6.3. With the previous notation, for 7 > 0 it holds 


(6.4) 


I 2 < 


4p \ 7 \/k + 1 / \/k — 1 


! 7 X 

/ /v min 


Vk +1 


where n — (A max T A m i n )/(A m i n -t- A m i n ). 

Proof. We note that in the given interval r 7 > , so that 

4p \ 7 ^ 


Oil 14 


rrr 


7 poo 


|^)-x«||dr< 

ra z 


W-^lldr. 


Following [33] Prop.3.1], an explicit bound for the last integral can be obtained, from 
which the bound follows. □ 

The derivation of an upper bound for I\ in (16.41) is a little more involved. 
Lemma 6.4. With the previous notation, for 7 > 0 it holds 

m —'y 


h< 101 - r—v 


p \ m' 


2A n 


, 2A m in + P 

7 I m — 7 ,--- m 


2 P 


V 2 A m i n / 


where 7 zs the lower incomplete Gamma function. 

Proof. We observe that the quantity HaK 1 ) — Xm\\ in h can be bounded by using 
Theorem 14.51 (see |2T|). hence we further split 7i as 


h = f ' P (1) - || dr + [ P - x$ || dr. 


(6.5) 


Theorem 14.51 can be applied to positive semidefinite matrices. Therefore we write 
e —(Mi+A m in/)T _ g— 2 A min Tg—(Afi—A min /)r^ w j^h _ \ min J positive semidefinite. For 
the first integral in (16.51) we thus have (see Theorem 14. 51 iill 


o-2A„ 


2P — ||z« -S$||dT= [ 2P e || e -( Ml -A ml nJ)r 6 _ g, P m ) e -(Tn-A m ,nJ)r-g|| dT 

Jo 


r 7 1 


'0 
< 10 


r 7 


j* 2p g (2A m i n +p)T 

( epr\ 

L Px 1+1 

V rn ) 


dr 


/ " P r m - 7 - 1 g-( 2Am m+P) T dr 

o \m) 


= 10-1 —J / t- 

p\m) J o 


= 10 


1 /epy 

p \ m ) V. 2A m i n -1- p 


m —7 


2A m i n + p 

7 I m — 7 ,-m 

' 1 ' 2p 















Functions of matrices with Kronecker structure 


15 


For the second integral in (16.51) . after the same spectral transformation and also 
usin s ft < (m/( 2 p))f for T e [^> t£], we obtain 


i P i 
n rT ' 


c (1) -x^lldr < 10 


4 p g SAminT m 2 


< 10 


2 P 

(2p 

V m 




e 5 p t dr 


^ 2 f^r 1 


m T 2 

2 P 


9 \ w 

-e min 5 PT dr 


< 10 7 " 1/2 


V m 


m 2 

f —e _ 2 AminT_ ^dr. 
o r 2 


We then use jT5[ formula 3.471.15], namely J^° x @ lX * dx = 2v/,Sl ^ 2 , to 

finally write 


10 


7 — 1/2 n oo 


" 5 p t dT = 10 


/o r 2 


7 - 1/2 


7T _ 2 d mm 

e 




2 Ami 


□ 

By collecting all bounds we can prove a final upper bound for the error, and give 
its asymptotic convergence rate. To this end, we first need the following technical 
lemma, whose proof is given in the appendix. 

Lemma 6.5. For 0 < x < an with 0 < a < 1 it holds that 

1 r n 

7 (n,x) < - 1 — e~ x . 

1 — an 


Theorem 6.6. For 7 > 0 and with the notation above, it holds that 
\\f(A)v-x®\\<2(I 1+ I 2 ) 



= O ^exp ^ for m and re large enough. ( 6 . 6 ) 


Proof. We only need to show that the term involving 7 is asymptotically bounded 
above by O (exp ''j for m and re large. Simple calculations show that the second 


argument of 7 satisfies (2A m in + p)/(2p)m < am with a < 1 for A m ax/A m in > 9; 
the larger this eigenvalue ratio, the smaller a , so that for a large ratio, the bound 
(2A m ; n + p)/{2p)m < a{m — 7 ) also holds. Hence, we can use Lemma [6.51 to write 
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7 (n, x)«e x Thereforc0, 



Writing down p = |(A max — A m i n ), and after a few algebraic calculations we obtain 

2A min + (In4 — l)p __ 1 (In4 - l)A ma , x + (8 - (In4 - l))A min = 1 / 4 A > _2_ 

2p 2 Amax — A m in 2 — 1 / 

where the last inequality holds for k large enough (namely for k > 10 ). 0 

The theorem above states that the convergence rate of the approximation depends 
on the condition number of the shifted matrix. 

Remark 6.7. If f(A)b is approximated in the Krylov subspace K m (A,b), then 
the error norm can be written as 

/»oo 

\\f(A)b-Xm\\< / \\e~ rA b - V m e _r ^ m V^ 6 ||da. 

J o 

Therefore, all the previous steps can be replicated, leading to an estimate of the type 

\\f(A)b- Xm\\ ~ exp (~-^=) 

where now k = A max /A m i n . The improvement in the convergence rate when exploit¬ 
ing the Kronecker form thus becomes readily apparent, with the shift acting as an 
“accelerator”. It is also important to realize that the error norm ||/(Mi) 6 i — x^\\ is 
also driven by the same quantity k, since the condition number of Mi and A is the 
same. Therefore, it is really by using the Kronecker form that convergence becomes 
faster. □ 

We next illustrate our findings with a simple example. A diagonal matrix is 
considered so as to be able to compute exact quantities, while capturing the linear 
convergence of the approximation. 

Example. We consider /( x) = 1/i/x (so that 7 = 3/2) and a diagonal matrix 
M\ of size n = 500 with logarithmically distributed eigenvalues in [10 1 ,10 3 ], giving 
p « 247; bi is the vector of all ones, normalized to have unit norm. We wish to 
approximate f(A)b , with b = vec(bibj). We compare the convergence curves of 
the usual approximation x m = V m f(H m )e 1 (solid thin line), with that of x® = 
(Qm < 8 > Qm)f{Tm)b (dashed thick line). As expected, the convergence rate of the 
latter is better (smaller) than for the standard method. The estimate in (16.611 is 
reported as thick crosses, and it well approximates the correct convergence slope. 
For completeness we also report the error norm ||/(Mi)&i — QmfiT^Q^biW, which 
behaves like that of the standard method, as noticed in the previous remark. □ 


2 We assume here that m — 7 is a positive integer, otherwise we can take the Gamma function 
associated with the closest integer larger than m — 7 . 
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Fig. 6.1. Convergence curves for various approximation methods, and estimate in JUl). 

6.1.2. Rational Krylov subspace approximation. Convergence bounds of 
Laplace-Stieltjes functions are harder to obtain when rational Krylov subspaces are 
used, for few results on error bounds for the exponential functions are available. In 
this section we discuss some of the results that can be obtained. However, we will show 
that additional results can be derived for the subclass of Cauchy-Stieltjes functions, 
since they involve inverses in place of exponential functions. 

If the approximation space is the extended Krylov subspace, which can be defined 
as K m (Mi,bi) + error bounds are difficult to obtain for Laplace- 

Stieltjes functions, unless da =dr (that is, 7 = 0). Indeed, for 7 = 0, we have 



f(A)b - x® = vec 


The integral coincides with the error matrix in the approximation of the numerical 
solution to the Lyapunov equation in the space Range(P m ). This connection was 
already used in Lemma 16.31 to establish an upper bound for the Krylov subspace 
approximation. Here, we just mention that an asymptotic bound can be obtained by 
using results in [27l ESI S3], giving 



where k = A max (.4)/A m i n (.4) = A max (Mi)/A m i n (Mi). Note that while there is no ben¬ 
eficial shifted matrix in this bound, the fourth root of k appears, ensuring significantly 
faster convergence rate than for standard Krylov subspaces. 

For 7 > 0, lack of explicit error bounds for the exponential function leaves the 
derivation of bounds for our setting an open problem. Nonetheless, experimental 
evidence seems to suggest a convergence rate similar to the one for 7 = 0 . 

Upper bounds for the exponential function when rational Krylov subspaces (12.211 
are employed are usually asymptotic and specialized to optimal values of the param¬ 
eters 01 ,..., We refer the reader to [HI sec.4.2] for more details. 

6.2. Convergence analysis for Cauchy Stieltjes functions. Functions be¬ 
longing to the Cauchy-Stieltjes class provide a more favorable setting, as they are 
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based on the resolvent function. Assume then that / is a Cauchy-Stieltjes function. 
Using the definition in (16.31) and assuming that b = vec(&i bj), we can write 

f(A)b -x®= f(A)b - (P m ® Pm)f(r m )(P m ® Pm) T b 


/ 0 pO 

(A- OjI)~ 1 bdj(u}) - (Pm. ® Pm) / (Tm - Ojiy 1 (P m ® Pm) T bd'y(oj) 

-oo J — oo 

0 

({A - Ldl)~ 1 b - ( P m ® Pm) (7m ~ w/) _1 (P m <8> P m ) T b) d 7 (w). 


Let M{u>) = M\ — so that for any v and V such that v = vec(V) we can write 
(7m — ujI)v = vec(M(oj)V + VM(oj) t ). Then, recalling the derivation in (15.31) . we 
obtain 

f(A)b-x® = f vec(X(w) - X m (w))d 7 (w), 

<7 —OO 

where X{u>) and X m (oj) are the exact and approximate solutions to the linear matrix 
equation M(u>)X + IM(w) T = bibj when the space K m {Mi — jI , bi) is used. Note 
that this space is invariant under shift, that is AT m (M 1 , 6 1 ) = K m (Mi — uil, &i), so 
that the columns of P m are still a basis for this space. An upper bound for the error 
can then be obtained as 

\\f(A)b-x®\\< / ||XM-Xm( W )||fd 7 (o;). 

<7—00 


Available convergence bounds for the approximation of X(u) onto standard and ra¬ 
tional Krylov subspaces can be employed as a first step towards an upper bound for 
the error norm above. Final bounds will then be obtained for specific choices of /, 
which yield specific functions 

For standard Krylov subspaces, we can once again use [33] Proposition 3.1] to get 


\\f(A)b-x®\\< 


Ktj) ~\~ 1 


(^min 2 


1 

/ Kcu + 1 


d 7 (w), 


where = (A max + A m ; n — \ w)/(A m i n + A m ; n — ^oj ). For consistency with the previous 
notation, we shall use kq = k. Therefore, 


\\f(A)b-x®\\<2 


'V^-1 
Vk + 1 


Ktjj + 1 


d 7 (w), 


(6.7) 


I— oo (Amin 2 ^) 

where ^ ss exp(— 2m/V k) for '/k large. The final upper bound can be ob¬ 

tained once the measure d 7 (w) is made more explicit, and this may influence the 
integration interval as well. For instance, for f(x) = x~^, 

f° Vk-uj + 1 a i \ \/Ku +1 

=d 7 (w) = / tt-TT7 , 

; <7—oo v-^min V ^ 

r° i i 


' rxj (A m i n 


=dw 


< 2 


= 2 


< 


/ oc (Amin ' 


f —1 


— oo (Amin 2^) ' 

- 1 1 1 1 
---— . dcu -f- 2 —— 


=dw 


=dcu T 2 


— 1 (Amin 2^) ^ ^ 


=dw 


( —w) V— w 


r° 

-1 V-v 


1 , „ 4 

du.' — 2 + —— 
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Since the inverse square root is both a Laplace-Stieltjes function and a Cauchy- 
Stieltjes function, it is not surprising that we get a similar convergence rate. The 
setting of this section allows one to determine a simpler expression for the bound. 

Following similar steps as for the inverse square root, for f(x) = ln(l + x)/x we 
have (note the change in the integration interval) 


c -1 


Hui + 1 


f— oo (Amin o^)V^u; 


d7(w) = 


Hui + 1 


— oo (A m i n 2 ^)\/'^'u ^ 


duj 


< 2 


— oo (Amin o^) ( ^0 


dee <2-2 = 4. 


In both cases, the integral appearing in (16.71) is bounded by a constant of modest 
size, unless A m i n is tiny in the inverse square root. Summarizing, when using stan¬ 
dard Krylov subspace approximation, \\f(A)b — x® || is bounded by a quantity whose 


asymptotic term is 



as m grows. 


When using the extended Krylov subspace for an Hermitian positive definite 


M, this quantity is 


replaced by (-f§+i) > 


where m is the subspace dimension and 


k = Amax/Amin (see, e.g., ES],[2H]), as conjectured in the case of Laplace-Stieltjes 
functions. Here, an explicit upper bound can actually be obtained. 

Rational Krylov subspaces can also be used, and the term ||X — A®||f can be 
estimated using, e.g., 0 Theorem 4.9]. 


7. Conclusions. In this paper we have shown how to take advantage of the 
Kronecker sum structure in A when using Krylov subspace methods to evaluate ex¬ 
pressions of the form f(A)b. Special attention has been devoted to the important 
case of the matrix exponential. Numerical experiments demonstrate that consid¬ 
erable savings can be obtained when the Kronecker sum structure is exploited. A 
detailed analysis of the convergence rate of the new approximation for symmetric (or 
Hermitian) and positive definite matrices was also proposed. 

Finally, while we have limited our presentation to the case where A is the Kro¬ 
necker sum of two matrices, the same observations and techniques apply to the more 
general case where A is the Kronecker sum of three or more summands, since this can 
be reduced to the Kronecker sum of two matrices. For instance, if 


A = M x ® M 2 ® M 3 := Mx® I ® I + I ® M 2 ® I + I ® I® m 3 , 


we can write 

A = Mi ® (I ® I) + I ® (M 2 ®I + I® Ms) =: M x ® I + I <g> M, 

and apply the techniques in this paper in a recursive fashion. 

Acknowledgement. The authors would like to thank Paola Boito for her careful 
reading of the manuscript and helpful comments. 

Appendix. In this appendix we prove Lemma |6.5I 
Lemma 16.51 For 0 < x < an with 0 < a < 1 it holds that 


l(n, x) < 


1 — an 
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Proof. We have 


7 (n,x) = (n- l)!e x f e x - ^ L ) = (n - l)!e x ( 


k =0 


\k=r 


X 

IF 


= (n — l)!e _x ' 


n\ 


*+£ 


(n + 1) ■ ■ ■ (n + j) 


< e 


< e 


i+£< 


,? = 1 


£< 

0=0 


(n + 1) ■ ■ ■ (n + j) 


. x n 1 


n 1 — a 
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